Developing phoneme‐based lip‐reading sentences system for silent speech recognition
نویسندگان
چکیده
Lip-reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences the wild. This paper attempts use phonemes as classification schema for explore an alternative and enhance system performance. Different schemas have been investigated, including character-based visemes-based schemas. The visual front-end model consists Spatial-Temporal (3D) convolution followed 2D ResNet. Transformers utilise multi-headed attention phoneme models. For language model, Recurrent Neural Network used. performance proposed testified with BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared state-of-the-art approaches sentences, demonstrated improved 10% lower error rate on average under varying illumination ratios.
منابع مشابه
Speech Recognition System For Spoken Japanese Sentences
A speech recognition system for continuously spoken Japanese simple sentences is described. The acoustic analyser based on a psychological assumption for phoneme identification can represent the speech sound by a phoneme string in an expanded sense which contains acoustic features such as buzz and silence as well as ordinary phonemes. Each item of the word dictionary is written in Roman letters...
متن کاملThe MUTE silent speech recognition system
sEMG based silent speech recognition has become a desirable communication modality because it has the potential to provide natural, covert, hands-free communication in acoustically challenging environments. To enable this capability, we have developed a portable, self-contained, Android based Mouthed-speech Understanding and Transcription Engine (MUTE) system. To demonstrate the MUTE system’s a...
متن کاملTowards a practical silent speech recognition system
Our recent efforts towards developing a practical surface electromyography (sEMG) based silent speech recognition interface have resulted in significant advances in the hardware, software and algorithmic components of the system. In this paper, we report our algorithmic progress, specifically: sEMG feature extraction parameter optimization, advances in sEMG acoustic modeling, and sEMG sensor se...
متن کاملAuxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading
The Aduio-visual Speech Recognition (AVSR) which employs both the video and audio information to do Automatic Speech Recognition (ASR) is one of the application of multimodal leaning making ASR system more robust and accuracy. The traditional models usually treated AVSR as inference or projection but strict prior limits its ability. As the revival of deep learning, Deep Neural Networks (DNN) be...
متن کاملMulti-pose lipreading and audio-visual speech recognition
In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: CAAI Transactions on Intelligence Technology
سال: 2022
ISSN: ['2468-2322', '2468-6557']
DOI: https://doi.org/10.1049/cit2.12131